In the ever-evolving landscape of digital content, the ability to process vast amounts of unstructured data has become a game-changer.
Today, we're excited to share how we leveraged Mixpeek to transform 1000 top movie trailers from simple video links into rich, structured data, opening up new possibilities for content discovery and analysis in the film industry.
The Challenge
The film industry generates an enormous amount of visual content, with trailers being a crucial component of movie marketing and audience engagement. However, traditional methods of cataloging and searching through this content are often limited to basic text-based metadata, missing out on the rich visual and auditory information contained within the trailers themselves.
Our challenge was to take a simple list of movie trailer URLs and transform them into a comprehensive, searchable database of film content.
Before: Raw Trailer Data
Initially, our data looked something like this:
{
"title": "2001: A Space Odyssey",
"trailer_url": "https://www.youtube.com/watch?v=oR_e9y-bka0"
}
This minimal information provided little opportunity for advanced search or analysis. Our goal was to enrich this data significantly.
The Mixpeek Solution
Mixpeek provided us with the perfect toolkit to tackle these challenges. As a developer platform specializing in processing unstructured data, Mixpeek offers a suite of tools for extraction, embedding, and generation across various modalities including text, image, video, and audio.
Here's how Mixpeek helped us overcome our challenges:
- Automated Processing: Mixpeek's S3 integration allowed us to set up an automated pipeline that processes new video content as soon as it's added to our designated S3 bucket.
- Multi-modal Analysis: Using Mixpeek's video processing capabilities, we were able to extract rich information from the trailers, including visual elements, audio features, and textual content from speech or captions.
- Data Synchronization: Mixpeek automatically keeps the structured data in sync with our downstream database, ensuring that our search and recommendation systems always have access to the most up-to-date information.
- Data Structuring: Mixpeek transformed our unstructured video data into a well-organized, structured format, making it easy to query and analyze.
- Metadata Generation: Leveraging its advanced AI capabilities, Mixpeek automatically generated tags, descriptions, and detailed metadata for each trailer.
The Implementation
Here's a simplified version of our implementation:
import json
import os
from youtubesearchpython import VideosSearch
from pytube import YouTube
from mixpeek import Mixpeek
import pymongo
# Initialize Mixpeek client
mixpeek = Mixpeek('your-api-key-here')
# Function to get YouTube trailer URL
def get_youtube_trailer_url(query):
videos_search = VideosSearch(query, limit=1)
return videos_search.result()["result"][0]["link"]
# Function to download YouTube video
def download_youtube_video(url, title):
yt = YouTube(url)
stream = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first()
stream.download(filename=f"{title}.mp4")
return f"{title}.mp4"
def main():
# MongoDB connection setup
client = pymongo.MongoClient("your-mongodb-connection-string")
db = client["demos"]
collection = db["movies"]
for movie in collection.find():
print(f"Processing movie: {movie['title']}")
try:
video_path = download_youtube_video(movie["trailer_url"], movie["title"])
response = mixpeek.connections.storage.upload(
connection_id="your-connection-id",
file_path=video_path,
prefix="movies"
)
print(f"Upload successful: {response}")
collection.update_one({"_id": movie["_id"]}, {"$set": {"full_url": response["full_url"]}})
os.remove(video_path)
except Exception as e:
print(f"Failed to process {movie['title']}: {e}")
if __name__ == "__main__":
main()
This script automates the process of downloading trailers, uploading them to Mixpeek for processing, and updating our database with the results.
After: Structured and Enriched Data
{
"embedding": [
-0.5492609739303589,
0.6699835062026978,
0.0106736421585083,
// ... (768-dimensional vector)
],
"file_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/ix-0Sqm__ZbVpAIHRkOJIyqEyhGEaNXs2dvY6WXJ6o6GkEzI0lXfR7S-qBimKKhI_OS8Bw/movies/2001: A Space Odyssey.mp4",
"start_time": 0,
"end_time": 150.25,
"fps": 24,
"duration": 150.25,
"resolution": [640, 360],
"size_kb": 22170,
"movie_info": {
"_id": {"$oid": "66734b8fb8c3476457015a90"},
"rank": "1",
"title": "2001: A Space Odyssey",
"director": "Kubrick",
"year": "1968",
"trailer_url": "https://www.youtube.com/watch?v=oR_e9y-bka0",
"full_url": "https://mixpeek-public-demo.s3.us-east-2.amazonaws.com/ix-0Sqm__ZbVpAIHRkOJIyqEyhGEaNXs2dvY6WXJ6o6GkEzI0lXfR7S-qBimKKhI_OS8Bw/movies/2001: A Space Odyssey.mp4"
},
"tags": [
"science fiction",
"space exploration",
"artificial intelligence",
"human evolution",
"Stanley Kubrick"
],
"description": "A groundbreaking science fiction film that spans from the dawn of man to humanity's venture into deep space, exploring themes of evolution, technology, and the nature of consciousness.",
"metadata": {
"genre": ["Science Fiction", "Adventure", "Drama"],
"awards": ["Academy Award for Best Visual Effects"],
"rating": "G",
"language": "English",
"notable_scenes": [
{
"timestamp": 15.5,
"description": "The iconic 'Dawn of Man' sequence"
},
{
"timestamp": 74.2,
"description": "The famous 'Star Gate' sequence"
}
],
"key_characters": [
"Dr. David Bowman",
"HAL 9000",
"Dr. Frank Poole"
],
"cinematographer": "Geoffrey Unsworth",
"music_composer": "Various classical pieces, including works by Richard Strauss and György Ligeti"
}
}
As you can see, we've transformed a simple URL into a rich, multi-faceted representation of the movie trailer. This structured output includes:
- Embedding: A 768-dimensional vector representation of the video content, enabling advanced similarity searches.
- File Information: Details about the processed video file, including its URL, duration, resolution, and size.
- Movie Metadata: Structured information about the movie itself, including its title, director, year, and relevant URLs.
- Tags: A list of keywords or phrases that categorize the content of the trailer and the movie.
- Description: A brief summary of the movie, providing context and key themes.
- Extended Metadata: Additional structured information about the movie, including genre, awards, rating, language, notable scenes, key characters, and more.
The Value
This transformation unlocks a wealth of possibilities:
- Enhanced Content Discovery: Users can now search for movies based on specific visual or conceptual elements within the trailers.
- Timestamp-based Queries: Enables searching for specific moments or scenes within trailers.
- Multi-modal Search: Supports text, image, and video-based searches.
- Automated Data Pipeline: The entire process from ingestion to indexing is automated.
- Scalability: New trailers are automatically processed and indexed as they're added.
- Rich Metadata: AI-generated tags, descriptions, and metadata enable sophisticated analysis and search.
- Improved Recommendation Systems: Detailed metadata and embeddings allow for more nuanced content recommendations.
- Content Analysis: Researchers can analyze trends in film marketing over time.
- Educational Applications: Film students can quickly find examples of specific techniques or narrative structures.
- Accessibility Features: Detailed scene descriptions could generate audio descriptions for visually impaired users.
Enhanced Content Discovery
Search movies based on specific visual or conceptual elements
Timestamp-based Queries
Find specific moments or scenes within trailers
Multi-modal Search
Support for text, image, and video-based searches
Automated Data Pipeline
Automated process from ingestion to indexing
Scalability
New trailers automatically processed and indexed
Rich Metadata
AI-generated tags, descriptions, and metadata for sophisticated analysis
Improved Recommendations
Nuanced content recommendations based on detailed metadata
Content Analysis
Analyze trends in film marketing over time
Educational Applications
Quick access to specific filmmaking techniques and structures
Accessibility Features
Generate audio descriptions for visually impaired users
Who Would Use It?
This solution has potential applications for various stakeholders:
- Film Studios and Marketers
- Streaming Platforms
- Film Researchers and Critics
- Audience Engagement Teams
- Content Creators
- Data Scientists and AI Researchers
- Film Educators
- Movie Enthusiasts
Conclusion
By leveraging Mixpeek's powerful capabilities, we've created a system that transforms how we interact with and analyze movie trailers. We've gone from simple video links to rich, structured data that opens up new frontiers in content discovery, analysis, and engagement in the world of cinema and beyond.
As we continue to refine and expand this system, we're excited about the potential applications not just for movie trailers, but for all types of video content. The future of AI-powered video analysis is here, and it's transforming how we understand and interact with visual media.